NVIDIA/Star-Attention: Efficient LLM Inference over Long Sequences

下载模型

huggingface-cli download meta-llama/Llama-3.1-8B-Instruct

Transformer Llama地址:

/home/cjl/miniconda3/envs/star/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py

启动:

  • llama-3.1-8b

    python run_babilong.py \
        -n "llama3.1_8b" \
        -p "/home/cjl/.cache/huggingface/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659" \
        -pc llama3 \
        -a star \
        -bs 200 \
        -l 8000 \
        -np 4 \
        --num_samples_per_task 1
    
    python run_babilong.py \
     -n "llama3.1_8b_babilong" \
     -p "/home/cjl/.cache/huggingface/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659" \
        -pc "llama3" \
        -a "star" \
        -bs 4096 \
        -l 16384 \
        -np 4 \
        -t qa1
    
    python run_babilong.py \
     -n "llama3.1_8b_babilong" \
     -p "/home/cjl/.cache/huggingface/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659" \
     -pc "llama3" \
     -a "long" \
     -bs  1024\
     -l 16384 \
     -np 1 \
     -t qa5
    
  • llama-3-8b
python run_ruler.py \
    -n "llama3_8b" \
    -p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
    -pc llama3 \
    -a star \
    -bs 16000 \
    -l 64000 \
    -np 4 \
    -t vt \
    --num_samples_per_task 1

长文本

python run_ruler.py \
    -n "llama3_8b" \
    -p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
    -pc llama3 \
    -a star \
    -bs 500 \
    -l 2000 \
    -np 4 \
    -t qa_1 \
    --num_samples_per_task 1
python run_ruler.py \
    -n "input" \
    -p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
    -pc llama3 \
    -a star \
    -bs 500 \
    -l 2000 \
    -np 4 \
    --num_samples_per_task 2

启动ring(并非原版ring)

python run_ruler.py \
    -n "llama3_8b" \
    -p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
    -pc llama3 \
    -a ring \
    -bs 1024 \
    -l 16384 \
    -np 4 \
    -t vt

创建vt

python /home/cjl/Star-Attention/ruler/data/synthetic/variable_tracking.py  --save_dir  /home/cjl/Star-Attention/tmp/1         --save_name vt         --subset validation         --tokenizer_path /home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87         --tokenizer_type hf         --max_seq_length 16384         --tokens_to_generate 30         --num_samples 5         --random_seed 42         --num_chains 1 --num_hops 4                           --context_template "<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

{task_template}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Memorize and track the chain(s) of variable assignment hidden in the following text.

{context}"
  • 问题2:ruler数据没下载成功

    ruler有些依赖没有自动pip install

    nltk下载

results matching ""

    No results matching ""